-
-
Notifications
You must be signed in to change notification settings - Fork 43
Optional retrieval result summarisation in CodeCompanion.nvim query tool #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #179 +/- ##
=======================================
Coverage 99.49% 99.49%
=======================================
Files 21 21
Lines 1589 1589
=======================================
Hits 1581 1581
Misses 8 8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
eba101d to
6ce9a03
Compare
|
I've managed to make requests from an adapter, but handling the async requests from a sync context is TRICKY. @olimorris any suggestions on how I might be able to simplify this? |
|
Could you hook into CodeCompanion's event system and listen for Alternatively, we could add a sync method on CodeCompanion's |
I hadn't thought of that. Will look into that. I'll probably still need to work out how to make the wait non-blocking for the main thread, though.
That would be very nice to have for this PR. For me, the real tricky bit is to make it NOT block the main UI. tbf I feel like I'm spoiled by modern async like Python asyncio, and have no idea how to work with coroutines directly 😭 |
It's high up on my list after I've got the agent mode sorted in CodeCompanion. I've had this exact conversation with sooooo many LLMs over the last 12 months 😆. I took a lot of inspiration from the lua-async-await library some time ago. I never ended up using it but what she's done in 90 LOC blew my mind. |
0ced3a4 to
1c562a1
Compare
1c562a1 to
86b59b4
Compare
|
@olimorris I've managed to implement this without blocking the main UI by putting the summarisation logics into the |
|
Or, we could move most of the result handling from output handler to |
|
I've managed to work around the rate limit by including the full results into one request (obviously, this'll need the summariser to be good at long context, but this is MUCH easier to implement than a rate limiter). |
|
@ravitemer, any suggestions on this feature? I'm asking because you've also done summarisation (from a different perspective), and maybe you can spot something I'm missing? |
…y one request to the summariser.
4e90c51 to
b517c76
Compare
|
@Davidyz This looks amazing! I didn't follow the previous commits but the current implementation seems solid. From a user's perspective, when I want to go through some repo to let LLM get an overview kind of a repo map but better I would certainly use the summary feature. Only thing I can think of is if you can make the summary option dynamic through some variable or adding a And just an observation, another edge case might be managing |
I thought this could be done through the adapter configuration, so I didn't do it here. OpenAI API, for example, offers
In the initial iterations, I send a request for each result (document or chunk). This simply doesn't work OOTB because of the rate limits (imagine having 50 simultaneous requests hitting a server with a rate limit of 10 per minute). I'm open to the possibility, but it'll be very tricky to implement. I'll have to think about it. Maybe this should be upstreamed, as each adapter instance has its debounce counter, so multiple requests can't all happen at the same time. With more extensions making their own requests (outside of the chat buffer itself), I think this will actually make sense. |
As for this one, currently there's the |
Thanks. Didn't see that. That solves it then.
Agreed. It looks solid for me. |
|
In a quick (non-rigorous) test, the summarisation reduced the query result from a 50k+ character string to an 18k+ string, meaning a 60% reduction in the token count for the tool result! |
…esult summarisation
This PR aims to implement summarisation for retrieval results.
This would:
VectorCode.Resultinto a functionImplement a thresholding mechanism that only triggers the summarisation if the document is too longprovides a callback that decides whether summarisation should kick in for each tool callExample config:
Related PR: